Filtering Your Data

GVPT399F: Power, Politics, and Data

dplyr basics

  • First argument is always a data object (for example, a dataframe).

  • Subsequent arguments typically describe which columns to operate on, using the variable names (without quotes).

  • Output is always a new data object.

Filter rows with filter()

filter(gapminder, country == "Australia", year > 2000)
# A tibble: 2 × 6
  country   continent  year lifeExp      pop gdpPercap
  <fct>     <fct>     <int>   <dbl>    <int>     <dbl>
1 Australia Oceania    2002    80.4 19546792    30688.
2 Australia Oceania    2007    81.2 20434176    34435.

Filter rows with filter()

filter(gapminder, continent %in% c("Asia", "Oceania"))
# A tibble: 420 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 410 more rows

Filter rows with filter()

filter(gapminder, pop > 500000 & pop < 1000000)
# A tibble: 88 × 6
   country  continent  year lifeExp    pop gdpPercap
   <fct>    <fct>     <int>   <dbl>  <int>     <dbl>
 1 Bahrain  Asia       1992    72.6 529491    19036.
 2 Bahrain  Asia       1997    73.9 598561    20292.
 3 Bahrain  Asia       2002    74.8 656397    23404.
 4 Bahrain  Asia       2007    75.6 708573    29796.
 5 Botswana Africa     1962    51.5 512764      984.
 6 Botswana Africa     1967    53.3 553541     1215.
 7 Botswana Africa     1972    56.0 619351     2264.
 8 Botswana Africa     1977    59.3 781472     3215.
 9 Botswana Africa     1982    61.5 970347     4551.
10 Comoros  Africa     1997    60.7 527982     1174.
# ℹ 78 more rows

Filter rows with filter()

filter(gapminder, pop > 500000 | pop < 1000000)
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Handy operations

== is equal to


!= is not equal to


>= is greater than or equal to


<= is less than or equal to

Handy operations


| is OR


& is AND


%in% is in